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Abstract. - Randomized network ensembles are the null models of real networks and are ex- 
tensively used to compare a real system to a null hypothesis. In this paper we study network 
ensembles with the same degree distribution, the same degree-correlations and the same commu- 
nity structure of any given real network. We characterize these randomized network ensembles 
by their entropy, i.e. the normalized logarithm of the total number of networks which are part of 
these ensembles. We estimate the entropy of randomized ensembles starting from a large set of 
real directed and undirected networks. We propose entropy as an indicator to assess the role of 
each structural feature in a given real network. We observe that the ensembles with fixed scale-free 
degree distribution have smaller entropy than the ensembles with homogeneous degree distribution 
indicating a higher level of order in scale-free networks. 
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Introduction. — The complexity of a network [1] de- 
pends on its global structural organization which is linked 
to the functional constraints the network has to satisfy. 
Real networks show different levels of organization. To 
characterize their structure few different quantities have 
been proposed: (i) the density of the links, (ii) the degree 
sequence [2] , (iii) the degree-degree correlations [3-5] , (iv) 
the clustering coefficient [6, 7], (v) the k-core structure 
[8-10] and finally (vi) the community structure [11-14]. 

To study the different information content retained by 
these structural quantites we will consider randomized 
rletwork models which are best studied by statistical me- 
chanics methods. Out of different statistical mechanics 
approaches of networks [15, 16], one has been proposed 
[17, 18] for networks with hidden variables 9i associated to 
each node i of the network. In the same framework it has 
been shown by [19] that the probability of a link should 
satisfy specific forms in order to guarantee good inference 
of the hidden variables. 

Every real network can be considered as a specific in- 
stance of a particular network evolution compatible to its 
functional constraints. Nevertheless in many cases real 
networks are not determined exactly by their evolution. 
We propose here to consider a real network as belonging 
to an ensemble of networks which would perform the same 
task equally well. For example in the biological world we 
observe a certain variability of biological networks across 
different species with the same biological function. The 



complexity of a given ensemble of networks increases as 
the number of networks in the ensemble decreases. Con- 
sequently a high complexity of the network ensemble cor- 
responds to a small variability of the networks in the en- 
semble. The entropy S of a given network ensemble [20] is 
proportional to the logarithm of the number of networks 
belonging to the ensemble. We expect that a very com- 
plex network is belonging to an ensemble of functionally 
equivalent networks of small entropy. Since it is difficult 
to characterize the minimal entropy ensemble a real net- 
work belongs to, we take successive approximations of the 
real network. 

To characterize the complexity of a real network we con- 
sider a series of randomized network models which retain 
some characteristics of the real networks. In particular 
we consider networks with a given degree sequence, given 
degree-degree correlations and a given community struc- 
ture. Degree-degree correlation [5] has been considered a 
signature of non randomness in the topology of the net- 
works. The correlations have been shown to be important 
in the Internet at the Autonomous System Level [3] and 
in biological networks [4] where the degree correlations are 
linked also to the modular structure [7] of the network. 

In our approach we will first consider a particular real 
network to be part of the ensemble of networks with the 
same number of nodes N and links L the real network 
has. This network ensemble is the G{N,L) studied by 
the random graph community. Subsequently we consider 
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the configuration model of networks with given degree se- 
quence and we restrict the number of possible networks. 
Furthermore we consider the ensemble of networks with a 
given degree sequence and with given degree correlations 
or with given community structure and we further restrict 
the space of possible networks. Finally we will consider 
the ensemble of networks with given community structure 
and degree sequence. How much information is carried by 
each of these ensembles? This paper is trying to answer 
this question by calculating the entropies of these ensem- 
bles which subsequently approximate the real network. 

The ensemble of networks with a given degree sequence 
falls in the class of hidden variable models [17, 18] with the 
hidden variable being nothing else than the Lagrangian 
multipliers of the connectivity of each node. 

The ensemble of networks with given degree sequence 
and degree correlations, or given degree sequence and 
given community structure are generalized hidden vari- 
able model and can also be used to generate networks with 
given degree-degree correlations/community structure. 

Undirected networks. — Given a real network with 
N nodes and given adjacency matrix (ay ), i = I, . . . , N 
we construct subsequent randomized networks ensembles. 
For an undirected network the first ensemble (zero or- 
der approximation) is the G{N, L) network ensemble of 
networks with given number of nodes N and links L = 
j ciij /2. The first order approximation is the config- 
uration network of given degree sequence {fci, . . . , /ca?} 



with ki 
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The second order approximation is 



the ensemble with given degree sequence {fci, . . . , /cat} and 
given average nearest neighbour connectivity fc„„(fc) = 
{d{ki — k) J fcj). Moreover one can consider the parti- 
tion function of the networks with given community struc- 
ture, and fixed number of links in within each community 
and between different communities. If the community q 
of node i is indicated with we can consider graphs with 
given A(q, q) = J2i<j Kit - q)5{qj - q)aij. The partition 
functions of these network ensembles are given by 



Zo = ^ ^L-^ fly) exp[^ /ly-ay] 

{aij} i<j i<j 

Zi ^ ]~[(5(fc, - y^a,j-)exp[y^%a,j-] 

{aij} i j i<j 

Y\_S{knnik)Nkk - ^^(fci - k)aijkj) 

k ij 

{aij} i j i<j 

n S{A{q, q') - ^)K^0 (1) 
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where hij are auxiliary fields, Nk indicates the number of 
nodes of degree k in the network Nk = X^i "^(^j ~ f^): the 
vector qi indicates to which community a node belongs 



and A{q, q') indicates the number of links between the 
community q and the community q' . The probability pij 
for a link between node i and node j (the probability for 



1) is given by 



p. 
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The number of undirected simple networks in each of these 
ensembles k is consequently given by 



(3) 



We define the entropy per node S of the network en- 
semble K as 

S. = i^lnAA.. (4) 

The number of undirected networks A/q with given num- 
ber of nodes and links L is given by the binomial 



jV(jV-l) 
2 

L 
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for distinguishable nodes in the networks [20]. The 
probability pijoi a given link (i,j) is given by p'"' = 
L/{N{N — l)/2) for every couple of nodes i, j. 

The volume of the network ensemble with given degree 
sequence. The first level of approximation is the one in 
which a given degree sequence is assumed. In the undi- 
rected simple case the partition function of the network 
ensemble with given degree distribution is given by 



{aij} i 3 



(6) 



Expressing the delta's in the integral form with La- 
grangian multipliers uji for every i = 1, . . . iV we get 



Z\—\ Duj e 



^Ui+Uj+hij 



) (7) 



where Vuj ~ Yii du)i/{2'K). We solve this integral by saddle 
point equations. The entropy of this ensemble of networks 
can be approximated in the large network limit with 



-5^c.*fc, + ^ln(l + e-*+-l) 

i i<j 

-^ln(27ra0 (8) 



with the Lagrangian multipliers Ui satisfying the saddle 
point equations 



ki 



and the coefficients ai defined as 



Oil = 
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The probabihty of a hnk i, j in this ensemble is then given 

by 



p 



(1) 



1 
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recovering the hidden variable ensemble [17,19]. In par- 
ticular in this ensemble pij ^ f{LJi)f{Ljj), consequently 
the model retains some 'natural' correlations [19] given by 
the degree sequence and the constraint that we consider 
only simple networks. This in fact arc nothing else than 
the correlations of the configuration model [21]. Never- 
theless we can consider the case in which the network 
is sparse and there is a structural cutoff in the system, 
ki < ^ {k)N. In this case we can approximate Eq. (9) by 
e"' = ki a/ (/c)7V, ai = ki . In this limit the network is not 
correlated pg)-""™"' = k,k,/{{k)N), oj* < and we can 
approximate the entropy of the ensemble as 



uncorr 
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= - 5] (In k, - l)h -\Y, ln(27rfc. 



i(fc)iV[ln((fc)iV)-l]-i(^|^) ^12.) 



which approximately gives for the volume 
((fc)iV)n 



'uncorr 



■ exp 



1 ({k'-y"' 



(k) 



(13) 



The expression (13) was already derived in [22] by combi- 
natorial considerations valid in a network with structural 
cutoff. In fact the term ((fc)A^)!! ~ {{k)N - 1)!! gives the 
total number of different ways we can link the 2L = {k)N 
half-edges associated to a degree sequence to form a net- 
work. In fact we can take a first half-edge of the network 
and we have 2L ~ 1 choices to match it with one of the 
other half edges, then we can take another half-edges and 
we have {2L — 3) possible choices of other half-edges to 
link to, giving rise to (2L — 1)!! networks. Out of these 
networks only a part of them is simple providing for the 



correction exp 



1 (i^y 

2 1^ (fc) ) 



[22] . Out of these simple net- 



works for each distinct adjacency matrix there are Yii 
networks that can be constructed by simply permuting the 
order of the edges at each node. 

It can be shown that within sparse uncorrelated net- 
works the scale-free networks with 7 — > 2 are the ones 
which minimize 1]^"-'= [22]. For correlated networks with 
natural correlations the entropy of the configuration model 
El decreases with the value of the power-law exponent 7. 
In figure 1 we plot the entropy of a scale-free network with 
natural cutoff and fixed average connectivity (k) = 6, 8, 10. 
The entropy Si of the configuration model is decreasing 
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Fig. 1: The entropy J]""'' of a configuration model of a network 
with power-law degree distribution A'^ = 10'' nodes and a fixed 
average connectivity (k) = 6, 8, 10 as a function of the power- 
law exponent 7. 



with decreasing power-law exponent 7 reaching its mini- 
mum at 7 2. This indicates that scale- free networks 
with low value of 7 presents higher level of ordering with 
respect to random homogeneous networks. 

The volume of a network ensemble with fixed degree cor- 
relations. The second order of approximation is to take 
into consideration degree correlations behind the 'natu- 
ral correlations' of the configuration model. The partition 
function for this ensemble is given by 

Z, = ^n5(fc,-^a.,)eS.<.''--^ 

{aij} i J 
K 

k,,„{k)kNk ~ kjOi^jSih - k) (14) 



k=l 



where K is the maximal connectivity in the network. Ex- 
pressing the deltas in the integral form we get for the par- 
tition function 



Zo = 



n(. 

i<j 



uJi-\-Ljj-\-hij-\-kj Ak ■ -{-kiAk ■ 



(15) 



The expression which can be evaluated as for the case of 
the camculation of Si where the Lagrange multipliers uji 
and Ak satisfy 



^w* +u]''.+kjAl.+kiAk. 



/ J _ iAj*+Lu'+kiAT +kiA* ' 

J 1 -I- e ' ^ '=* ''3 



T 

3^ 



(16) 
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u* + k j At , +kiAt . 
-''7" w*+(.J'+fcjA*.+feiA*. 
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If we solve this equation for a given real network degree 
sequence and nearest neighbor average degree, we can then 
construct other networks in the same ensemble just by 
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drawing a link i , j with probability 



p. 



(2) 



1 + e 



(17) 



The entropy of this ensemble is approximatly equal in the 
large network limit to 



TVS 



und 



^UJ*ki - Afcfcn„(fc)fciVfc 
i k 

■^ln(l + e'^r+^l+'^.^fcj+fej^fci) 



-iEl^(2™0-^EM27ra,) (18) 



with ai , Qffc defined as 

^uj' +kj Al,+kiAk^ 



Oil 



a,. = E'5(^' -fc)E^'- 



(19) 
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In table 1 we report the entropy for different undirected 
network [23] ensembles at different level of approximation. 
We consider the Internet network at the Autonomous Sys- 
tem Level, the Protein Interaction networks of 5*. cere- 
visiae (DIP database) and the partial map of protein in- 
teraction network of H. Sapiens [24] . We observe that for 
these networks taking into account the degree distribu- 
tion strongly reduces the entropy of the randomized net- 
work ensemble. Moreover taking explicitly into account 
for degree-degree correlation fine tune the value of the en- 
tropy of the randomized ensemble. 

The volume of network ensemble with given community 
structure. A different ensemble of networks is the en- 
semble of networks with given community structure and 
degree sequence. Suppose that we have a network and we 
detect Q communities such that each node i = 1, . . . , N 
belongs to the community qi = 1, . . . ,Q with Q finite. To 
find a randomized ensemble of networks with the given 
community structure we impose that the nodes have fixed 
degree sequence and fixed number A(g, q') of links in be- 
tween the communities q and q' . In an undirected network. 
A((7, q') is given by the following expression 



(20) 



Following the same steps as in the previous case we find 
that the entropy for such an ensemble is given by 



i q<q' 



]-Y.\n{2'Kai) -\Y.^iv{2T:aq,q,) (21) 
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AS-97-11 


3015 


5156 


13.3 


7.5 


7.3 


AS-98-10 


4180 


7768 


14.9 


8.6 


8.4 


AS-99-10 


5861 


11312 


16.1 


9.2 


9.0 


AS-00-10 


8836 


17822 


17.5 


9.8 


9.6 


AS-01-03 


10515 


21455 


18.1 


10.1 


9.8 


Yeast DIP 


4135 


8099 


15.6 


12.3 


11.1 


H. Sapiens PI 


3134 


6726 


16.3 


12.3 


12.2 



Table 1: Entropies of randomized network ensembles start- 
ing from real undirected networks with A*' nodes and L links. 
Er"*, Sr"*, Sr"^ indicate the entropy of a undirected network 
with assigned A'^ nodes and L links, with given degree sequence 
and with given degree sequence and degree correlations respec- 
tively. The data sets [23] "AS-year-month" indicate different 
snapshot of the Internet at the Autonomous System level, the 
yeast DIP dataset is the protein interaction of S. cerevisiae 
and H. Sapiens PI is the partial human protein interaction 
map [24]. 



with the Lagrangian multipliers {tUi}, {wq^q'} satisfying 
the saddle point equations 



A{q,q') 



^ 1 + e'^'+'^^+^'i-'j 

i<j 

2_^S{q^ - q)6{qj - q ^ 

i<j 



(22) 



1 + e' 



and ai,aq^qi defined as 



X ^ e 



E 



'5(<7, - 9)<5(gj - ?')- — 7^23) 



(1 + e' 



The probability for a link between node i and j is equal 
to 

In the case of the Zachary club [25] we where able to cal- 
culate Sf"^''' = 3.94 and S^"''''' = 3.25 quantifying the 
amount of information present in the known community 
partition. 

Directed networks. — An undirected network is 
determined by a symmetric adjacency matrix, while the 
matrix of a directed network is in general non-symmetric. 
Consequently the degrees of freedom of a directed network 
are more than the degrees of freedom of an undirected 
network. If we consider the number of directed networks 
Ag^*'' with given number of nodes and of directed links we 
find 



q<q' 



^1 N{N~ 1) 



(25) 
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Volume of randomized directed network ensembles with 
given degree sequence. To calculate the volume of di- 
rected networks with a given degree sequence of in/out 
degrees fc™} we just have to impose the constraints 

on the incoming and outgoing connectivities, 



dir 



{a.ij} i j i j 

expl^hijOij] (26) 



Following the same approach as for the undirected case, 
we find that the entropy of this ensemble of networks is 
given by 



dir 



E 



(out) 



E^. 



(in) 



-Eln(l + e"'+"0 



2 (in) (oMt)N 



(27) 



with the Lagrangian multipliers satisfying the saddle 
point equations 



Jout) 



UJ ■ -\-LJ ■ 



1 + e J * 



(28) 



with 



(out) 



a 



(in) 
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(29) 



The probability for a directed link from i to j is given by 



(l.dir) 



1 



If the 



1, 



(30) 

N the directed 

l,((iir) _ 



network becomes uncorrelated and we have p, 
^(o"t) ^j™) I yH^kinYN. Given this solution the condition 
for having uncorrelated directed networks is that the max- 
imal in-degree isT^™) and the maximal out-degree ivr(°"*) 
should satisfy, J4r(°"*)/\/(fci„)7V < 1. The entropy of 
the directed uncorrelated network is then given by 



i\T\-^uncorr 



ln((fc,„)7V)!-Eln(fcf'"!A;l™*^!) 



out I 

2 {kin) {kout) 



(31) 



which has a clear combinatorial interpretation as it hap- 
pens also for the undirected case. In table 2 we report the 
entropy of directed networks and their undirected version 
observing that different degree distributions reduce the 
entropy of randomized network ensembles by a different 
amount, some carrying more information than others. 

Conclusions. — In conclusion we have studied the 
space of possible networks in randomized models of com- 
plex networks. We have found that random scale-free net- 
work ensembles with low power-law exponent 7 have a 
lower entropy than random network with an homogeneous 
degree distribution. The successive random approxima- 
tions of a real graph characterize to which extent the de- 
gree sequence, the degree-degree correlations or the com- 
munity structure constraint the network. We have eval- 
uated the entropy of randomized ensembles starting from 
a set of different real directed and undirected networks 
showing how much each structure feature reduce the space 
of possible networks. Future work will focus in extending 
these results to weighted networkand measurement of large 
deviations in ensembles of random networks with hidden 
variables. 
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\^und 




Y^und 


Littlerock FW 


183 


2,494 


48.4 
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Neural net. 
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22.5 


Power-grid net. 
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8.7 


ND WWW 
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